Do Characters Abuse More Than Words?
نویسندگان
چکیده
Although word and character n-grams have been used as features in different NLP applications, no systematic comparison or analysis has shown the power of character-based features for detecting abusive language. In this study, we investigate the effectiveness of such features for abusive language detection in user-generated online comments, and show that such methods outperform previous state-of-theart approaches and other strong baselines.
منابع مشابه
Quote Attribution for Literary Text with Neural Networks
We propose a method for using neural networks to attribute quotes in literary texts. Since previous work has been unable to successfully solve this problem based on bag-of-words features, we study the issue of whether this is due to the limited expressiveness of such features. By re-framing the modeling of quotes and characters as based off of word vectors, we hope to demonstrate that individua...
متن کاملA Structural Approach for Segmentation of Handwritten Hindi Text
This paper makes an attempt to segment the handwritten Hindi words. The problem of segmentation is compounded by the possible presence of modifiers (matras) on all sides of the basic characters and due to the uncertainty introduced in the character shapes by way of different writing styles. We have devised a structural approach to capture the similarities and differences between structure class...
متن کاملCharacter Decomposition and Transposition Processes in Chinese Compound Words Modulates Attentional Blink
The attentional blink (AB) is the phenomenon in which the identification of the second of two targets (T2) is attenuated if it is presented less than 500 ms after the first target (T1). Although the AB is eliminated in canonical word conditions, it remains unclear whether the character order in compound words affects the magnitude of the AB. Morpheme decomposition and transposition of Chinese t...
متن کاملPhonological Codes Constrain Output of Orthographic Codes via Sublexical and Lexical Routes in Chinese Written Production
To what extent do phonological codes constrain orthographic output in handwritten production? We investigated how phonological codes constrain the selection of orthographic codes via sublexical and lexical routes in Chinese written production. Participants wrote down picture names in a picture-naming task in Experiment 1or response words in a symbol-word associative writing task in Experiment 2...
متن کاملExtension of Zipf's Law to Word and Character N-grams for English and Chinese
It is shown that for a large corpus, Zipf 's law for both words in English and characters in Chinese does not hold for all ranks. The frequency falls below the frequency predicted by Zipf's law for English words for rank greater than about 5,000 and for Chinese characters for rank greater than about 1,000. However, when single words or characters are combined together with n-gram words or chara...
متن کامل